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Abstract 

In the online channel coding model, a sender wishes to communicate a message to a receiver by 
transmitting a codeword x = (xi, . . . , a;„) G {0, 1}" bit by bit via a channel limited to at most pn 
corruptions. The channel is online in the sense that at the ith step the channel decides whether to flip 
the ith bit or not and its decision is based only on the bits transmitted so far, i.e., (xi, . . . , Xi). This is 
in contrast to the classical adversarial channel in which the corruption is chosen by a channel that has 
full knowledge on the sent codeword x. The best known lower bound on the capacity of both the online 
channel and the classical adversarial channel is the well-known Gilbert- Varshamov bound. In this paper 
we prove a lower bound on the capacity of the online channel which beats the Gilbert- Varshamov bound 
for any positive p such that H{2p) < ^ (where H is the binary entropy function). To do so, we prove 
that for any such p, a code chosen at random combined with the nearest neighbor decoder achieves with 
high probability a rate strictly higher than the Gilbert- Varshamov bound (for the online channel). 

1 Introduction 

The classical scenario in coding theory is that of a sender Alice who wants to transmit a message n to a 
receiver Bob via a binary communication channel. To do so, Alice encodes her message u into a codeword 
X = {xi, . . . , Xn) G {0, 1}" and sends it to Bob, who is expected to recover the message u. However, the 
channel is allowed to corrupt (possibly probabilistically) at most a p-fraction of the codeword, i.e., to flip 
at most pn bits in x, for some p G [0, 1]. The goal is to find a coding scheme by which Alice can send 
as many distinct messages as possible while ensuring correct decoding by Bob with high probability (over 
the encoding, decoding and the channel). Roughly speaking, we say that a code achieves rate R if 2^" 
distinct messages can be sent using codewords of length n. Viewing the channel as a malicious jammer, 
it is important to specify what information the channel has while deciding on which bits to flip. Such a 
specification defines the model of communication and strongly affects the obtainable rate of communication. 

In one extreme, there is the classical adversarial model in which the channel has full knowledge on the 
entire transmitted codeword x. Given x and the coding scheme of Alice and Bob, the channel chooses an 
error for x. Calculating the maximum achievable rate for such a channel is a fundamental open problem 
in coding theory. The best known lower bound on the rate is due to Gilbert [ 8 1 and Varshmov ll20l and 
equals 1 — H{2p), where H stands for the binary entropy function. Namely, Gilbert and Varshamov show 
that there exists a subset of {0, 1}" of size roughly 2(^~^(^p))" in which every two distinct vectors have 
Hamming distance at least 2pn + 1. This implies that if we take the vectors in this set as codewords then 
a nearest neighbor decoder always recovers the correct sent codeword. On the other hand, the best known 
upper bound is due to McEliece et al. lITSll and is strictly higher than the Gilbert- Varshamov bound for any 

pe (o,i). 

In the second extreme, there are channel models in which the error imposed on the codeword x is com- 
pletely independent of x. An example of such a channel is the well-known binary symmetric channel studied 
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(among other channels) by Shannon |[T9l . In this channel every transmitted bit is flipped independently with 
probability p, no matter what the sent codeword is. As opposed to the classical adversarial model, the pic- 
ture here is completely clear, since Shannon proved that 1 — H{p) is a tight lower and upper bound on the 
maximum achievable rate. 

In this work we continue the line of research in |[T2l [6l |7l which study the online channel model — 
a channel model whose strength lies somewhere between the above two extremes. In the online channel 
model, Alice sends a codeword x bit by bit over a binary communication channel. For each 1 < i < n the 
channel decides whether to flip the ith bit or not immediately after Xj arrives. This means that the channel's 
decision depends only on (xi, . . . As in the adversarial model, the channel is limited to corrupt at 
most pn of the bits. Roughly speaking, the online channel is stronger than the binary symmetric channel, 
as an online channel can mimic the random behavior of a binary symmetric channel. On the other hand, 
the online channel is weaker than the classical adversarial channel, as an online channel is limited to make 
its decisions in a causal manner. The main theme of this work is to better understand the strength of the 
online channel model — in particular, does the maximum achievable rate when communicating over online 
channels resemble that of the classical adversarial channel, that of the binary symmetric channel, or maybe 
neither? 

Studying online adversarial channels is naturally motivated by practical settings in which the sent mes- 
sage is not known to the channel which simultaneously learns it. For example, the online channel model 
simulates a transmission of a codeword x via n uses of a channel over time, where at time i the ith bit of x 
is transmitted. At each step the channel decides whether to flip xi whereas the receiver waits until the end of 
the transmission before decoding. As in the classical adversarial channel model, the channel is limited to at 
most pn corruptions, what is usually interpreted as limited processing power or transmit energy. From a the- 
oretical point of view, understanding the online channel model and comparing it to the classical adversarial 
channel model might shed some light on the capacity of the classical adversarial channel, a long-standing 
open problem in coding theory. 

1.1 Related Work 

Let Coniine(p) dcnotc the capacity of the online channel, defined as the maximum achievable rate when 
communicating over an online channel allowed to corrupt at most a p-fraction of the transmitted codeword. 
We give a rigorous definition of the capacity Coniine(p) in Section[2] The known bounds on the capacities 
of the classical adversarial channel and the binary symmetric channel immediately imply some bounds on 
the capacity of the online channel. It is clear that any coding scheme that works for the classical adversarial 
channel works also for the online channel, and hence Coniine(p) > 1 — H{2p). On the other hand, the 
online channel can flip every bit independently with probability p (up to pn of them) ignoring the transmitted 
codeword x. It is not hard to verify that this implies that Shannon's upper bound (for the binary symmetric 
channel) holds for the online channel model as well, that is, Coniine(p) < 1 — H{p). Recently, this upper 
bound was improved in |[T2l for any p > 0.15642. More precisely, it was shown in [ 12| that for any p > \ no 
communication with positive rate is possible via the online channel and that for p < |,Coniine(p) < 1— 4p. 
This implies that the online channel model is strictly stronger than the binary symmetric channel, in the sense 
that there exist values of p (e.g., p = \) for which no communication is possible over the online channel 
whereas a positive rate is possible for the binary symmetric channel. In |[T2l no non-trivial lower bounds on 
Coniine(p) wcrc presented. The state of the art on the online channel model is given below (see Figure[T]l. 

Theorem 1.1 (Ha). For any p G [0, it holds that I- H{2p) < Coniine{p) < min(l - H{p), (1 - 4p)+), 
where (1 — 4p)"'" is defined to be 1 — Apforp < | and otherwise. 

The problem of coding against online channels over large alphabets was studied in fH, where a full 
characterization of the capacity is presented. Namely, it is shown in 1 6 1 that when communicating over large 
alphabets, the online channel is no weaker than the classical adversarial channel and has capacity 1 — 2p for 
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Figure 1 : The bounds on the capacities of the classical adversarial channel and the online channel. The bold 
line (in purple) is the upper bound on the capacity of the online channel from |[T2]| . 



p < 5 and otherwise. The proofs of the tight upper and lower bounds in (6) use the geometry that fields of 
large size enjoy, and it is not clear if these ideas can be extended to the binary case considered in our work. 

To the best of our knowledge, other than the works mentioned above, communication in the presence of 
an online channel has not been explicitly addressed in the literature. Nevertheless, we note that the model 
of online channels, being a natural one, has been "on the table" for several decades and the analysis of 
the online channel model appears as an open question in the book of Csiszar and Korner [4] in the section 
addressing Arbitrarily Varying Channels (AVC) ■ (The AVC model is a broad framework for modeling 
channels, which encapsulates our online model. For a nice survey on AVCs see lfT3l .) In addition, various 
variants of online channels have been addressed in the past, for instance ||2l[TTl[T7l[T8l[T6l|91 - however the 
models considered therein differ significantly from ours. 



1.2 Our Result 

The Gilbert- Varshamov rate of 1 — H{2p) is the state of the art when communicating over classical ad- 
versarial channels. The question whether one can improve upon this rate when communicating over online 
channels is an intriguing question. An affirmative answer would not only make progress in our understand- 
ing of the online channel model but also may hint on a possible separation between the online and classical 
adversarial channels. In our work we address this question and present a lower bound on the capacity of the 
online channel that beats the Gilbert- Varshamov bound. More precisely, we prove that for any small enough 
p, the Gilbert- Varshamov lower bound is not tight for the online channel. This means that for any such p, 
there exists a coding scheme for the online channel with rate strictly higher than 1 — H{2p). This is the 
first lower bound for the online channel which is not known to hold for the classical adversarial model. Our 
result is stated below. 

Theorem 1.2. For any p such that H{2p) G (0, |) there exists a 6p > such that 

Conlineip) > 1 " H {2p) + 5p. 

Note that H{2p) G (0, |) for any p € (0, ^ • H-\^)) ^ (0, 0.055). We also note that our result holds 
with respect to the average error criteria (see Section [2^ for a discussion on the error type). Finally, we 



remark that in order to prove Theorem 1.2 we show a lower bound on a much stronger channel model, which 



we refer to as the two-step model (defined below). 
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1.3 Techniques and Proof Overview 



Our goal in this paper is to show the existence of an encoder and a decoder for the online channel by which 
Alice and Bob achieve some rate R strictly higher than 1 — H{2p), which is the rate achieved by the Gilbert- 
Varshamov bound. Instead of dealing directly with the online channel model we consider a stronger channel 
model, the two-step model, defined as follows. Denote a = R—e for some small e > 0. In the. first step Alice 
sends the first an bits of her encoded message and the channel (after viewing this transmitted information) 
decides which bits to flip out of these an bits. In the second step Alice sends the rest of the codeword and 
the channel (now with full knowledge on the sent codeword) decides which bits to flip out of the remaining 
transmission. The number of bits corrupted in the two steps together is limited to be at most pn. Notice that 
this model is stronger than the online channel model in the sense that any code allowing communication 
over the two-step model will also allow communication over our model of online channels. Indeed, any 
adversarial strategy of the online channel model implies a valid strategy for the two-step model achieving 
the exact same parameters. Therefore, in order to prove our lower bound on the capacity in Theorem 1.2 it 
suffices to consider the two-step model. 

We turn to describe our construction of codes that allow communication over the two-step model with 
rate R greater than 1 — H{2p). We first note that no linear code will suffice. Roughly speaking, this 
follows from the fact that each codeword j; in a linear code has exactly the same "neighborhood structure". 
Thus, when a linear code is used, the problems of communicating over channels with limited information 
regarding the codeword x and those with full information are equivalent!^ We ^^^^ ^'^ study codes which 
are not linear. A natural candidate is a code in which the codewords are chosen completely at random and 
the decoder is the nearest neighbor decoder. More precisely, we pick a code C : [2^"] — )• {0, 1}" such that 
for every u G [2^"] the codeword C{u) is independently and uniformly chosen from {0, 1}". Given such 
a code. Bob outputs a message u' E [2^"] that minimizes the Hamming distance between C(u') and the 
received corrupted vector. 

In order to prove our theorem, we show that the decoding succeeds with high probability no matter how 
the adversarial online channel behaves. The intuitive idea is the following. In the first step Alice sends a 
prefix m G {0, 1}""^ of a codeword where a = R — s. Since the code C was constructed randomly, for a 
typical prefix m there are exponentially many (about 2^") codewords in C that share m as a prefix. This 
means that the channel is not able to recognize the sent codeword at this point, and therefore it has no good 
way to decide which bits from m to flip. Roughly speaking, we show that no matter which bits the adversary 
decides to flip in this first step, for most of the codewords that share m as a prefix the error imposed by the 
adversary is in a wrong direction and thus will not enable the adversary to cause a decoding error (after the 
additional corruption of the second step). In fact, as our analysis shows, for our codes C the best strategy for 
the adversary is actually to save its flipping power and to corrupt only in the second step of communication. 
This implies that in our setting the two-step channel will concentrate all its error on the second portion of 
the codeword! Comparing this state of affairs to the classical channel model in which the error is spread out 
over the entire codeword sheds light on the reason we are able to improve upon the Gilbert- Varshamov rate 
of 1 — H{2p). Very loosely speaking, to prove our improved rate, we first show that a code C constructed 
at random is expected to allow successful communication. However, as the events corresponding to correct 
decoding are not independent of each other, our proof for the existence of the desired code follows a rather 
delicate analysis. 



Our analysis holds for the two-step model and thus suffices to prove Theorem 1.2 To improve upon the 
results of Theorem [Oj it is natural to try to generalize our analysis to a channel model that includes more 
than two steps. At its extreme (the n-step model) we obtain our original online channel. Such a generalized 



'in detail, for any linear code of (minimum) distance at most 2pn there exists an online channel that causes any decoder to err 
with probability at least | for every sent message. To see this, assume that x and y are two codewords of distance at most 2pn, and 
let z be a vector of distance at most pn from both x and y. Now, consider a channel that maps any codeword w Xo w + [z — y) ox 
to TO + (2: — x) with probability | each. Observe that this is an online channel that causes any decoder to err with probability at 
least |. 
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analysis is left open in this work and seemingly cannot be addressed by the current proof techniques. 

In the following Section [2] we set the notation and definitions used throughout our work. We then turn 
to prove Theorem 1.2 in Section [3] 



2 Preliminaries 



2.1 Notations and Standard Definitions 

For A; G N we denote [fc] = {i G N | 1 < i < k}. For a vector x = (xi, . . . , x„) G {0, l}" and a 
number 1 < A; < n we denote by xj^^j the projection of x on its first k entries, i.e., x\[^ = (xi, . . . , Xfc). 
The Hamming weight of a binary vector is the number of its 1-entries, and the Hamming distance between 
X G {0, 1}" and y G {0, 1}", denoted by dist/f(x, y), is the Hamming weight of x + y, where the addition 
is modulo 2 and coordinate-wise. 

For two functions /, <^ : N — )• M, we say that / and g are polynomially equivalent and write / ~ g if 
there are constants ci, C2 such that n~'^^ ■ f{n) < g{n) < ■nP'^ ■ f{n) for all large enough n G N. Similarly, 
we write / < 5 if there is a constant c such that f{n) < ■ g{n) for all large enough n G N. 

The binary entropy function H : [0, 1] — )• [0, 1] is defined by H{0) = H{1) = and H{p) = —plogp — 
(1 — p) log (1 — p) for p G (0, 1), where the logarithms, here and everywhere in this paper, are of base 2. It 
is well-known and easy to verify that for any c G (0, (^") ~ 2^^'^)". We need the following two simple 



facts regarding H. Notice that the first fact implies the second (by setting the parameters of Fact 2. 1 to be 



X = 0, y = |, and 9 = 1 — Ap). 

Fact 2.1. The entropy function H is strictly concave, that is, for any 9 G (0,1) and x,y £ [0,1] it holds that 
9 ■ H{x) + {1 — 9) ■ H{y) < H{6 ■ x + {1 — 9) ■ y), and equality holds if and only if x = y. 

Fact 2.2. For any p G (0, \), Ap < H{2p). 

We need the following version of the Chernoff-Hoeffding Bound lITOl [141 (addressing random variables 
which are not necessarily indicator variables). 

Theorem 2.3 (Chernoff-Hoeffding). Let Xi , X2 , • • • , X]\f be independent and identically distributed ran- 
dom variables taking values in the unit interval [0, 1] with expectation at most p,. Then, 

' N 



.1=1 



2.2 The OnUne Channel Model and the Two-step Model 

For i? > 0, an (n, i2n)-code C is a mapping C : [2^"] — {0, 1}". The elements of the image of C are 
called codewords. Define a = R — e for some e > and let m G {0, 1}"" be some prefix. Here and 
throughout our work we ignore rounding issues and assume that an, Rn and other such expressions are 
integers. We denote by C™ the set of all messages whose codewords have m as a prefix, i.e., = {li G 
[2^"] I C(u)|[^„] = m}, and by the set of all messages whose codewords do not have m as a prefix, 
i.e., = [2^"] \ C"^. A random code is a mapping C : [2^^"] {0, 1}" such that for every u G [2^"] 
the codeword C{u) is independently and uniformly chosen from {0, 1}". Notice that we use C to denote a 
fixed code and C to denote a code which forms a random variable. 

Consider a code C. Throughout this work, we consider the average error success criteria while commu- 
nicating over the online channel model. Namely, Alice's message u is considered as uniformly distributed 
over [2^"]. Given the message u, Ahce deterministically maps u to the codeword C(n) = (xi, . . . , x„) G 
{0, 1}" and transmits it over the communication channel. For every i G [n] the decision of the channel 
whether to flip Xj or not depends only on (xi, . . . , Xi). In addition, the channel is limited to at most pn 
corruptions. Bob's goal is to recover u from his received vector. 
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The probability of error of C is defined as the average over all u G [2 ] of the probability of error for 
the message u, i.e., the probability that the message that Bob decodes differs from the message u encoded 
by Alice. Here, the probability is taken over the random variables of the channel and of Bob. We say that the 
rate R is achievable if for every e > 0, (5 > and every sufficiently large n there exists an (n, {R — 5)n)- 
code that allows communication with (average) probability of error at most e. The supremum over n of the 
achievable rates is called the capacity of the online channel and is denoted by Coniine(p)- We note that 
the discussion in the introduction regarding the known bounds on the capacity of both the binary symmetric 
channel and the classical adversarial channel holds for average error (see e.g., pSl]). 

One may also consider a definition for capacity which takes into account the maximum error over mes- 
sages u and not the average error. In this maximum error (or worst case) setting, if the encoding function 
of Alice is considered to be deterministic, it is straightforward to verify that online channels have no ad- 
vantage over the classical adversarial channel. This is no longer the case when one allows randomization 
in Alice's encoding process (referred to as stochastic encoders). As common in the study of Arbitrarily 
Varying Channels (e.g., Q), there is an equivalence between the capacity when considering the models of 
(a) deterministic encoders and average error criteria and (b) stochastic encoders and maximum error success 
criteria. This equivalence holds also for the online channel model studied in this work. 

As mentioned before, for our lower bound we consider a two-step model defined for a parameter a = 
R — e where e > is some small constant. In ihe. first step, Alice sends the first an bits of the encoded 
message and the channel decides which bits to flip out of these an bits. In the second step, Alice sends 
the rest of the codeword and the channel decides which bits to flip out of the remaining (1 — a)n bits. The 
number of bits corrupted in the two steps together is limited to be at most pn. In each step, the decisions 
made by the channel are based on the information transmitted in and before the step at hand. The notion of 
(average error) capacity is defined as done above. As explained in the introduction, any lower bound on the 
capacity of the two-step model holds also for the online channel model. 



3 Proof of Theorem 1.2 



Before presenting the proof of our lower bound for the online channel model, let us start with a short 
comparison to the Gilbert- Varshamov lower bound that holds for the classical adversarial model. One way 
to prove the Gilbert- Varshamov bound is to show that a code C : [2^"] — )• {0, 1}" chosen at random 
combined with the nearest neighbor decoder implies a coding scheme of rate almost 1 — H{2p) with high 
probability. Roughly speaking, the achievable rate in this argument is affected by the number of codewords 
X that are, far away from any other codeword in C. Namely, one is interested in proving that there are lots of 
codewords x, for which the ball of radius 2pn centered at x includes no codewords except x. Indeed, such 
a transmitted codeword x will be decoded correctly by a nearest neighbor decoder no matter which error is 
imposed by the channel. As the volume of this ball is X]?=o il) ~ 2^^^^^"' the rate essentially follows. 

Recall that for our lower bound on the capacity of the online channel we consider the two-step model. 
In the first step Alice sends a prefix m of length an and the channel chooses which bits to flip out of these 
an bits, and in the second step Alice sends the remaining (1 — a)n bits and the channel again chooses 
which bits to flip out of the remaining part of the codeword. Let us now study the required "forbidden ball" 
corresponding to a codeword x in the two-step model. To take advantage of the two-step model, consider 
fixing an error pattern e imposed on the first portion of x. Let B{x, e) be the subset of {0, 1}" that satisfies 
the following property: if the codeword x was transmitted, the error pattern e was imposed on the first 
portion of x in the first step, and there are no codewords other than x in B{x, e); then no matter what the 
channel does in the second step the decoding of Bob will succeed. We define B{x, e) (denoted as B{z) for 
z = X + e) rigorously and analyze its size in the upcoming section. Specifically, we show that the size of 
B{z) is exponentially smaller than 2^(^p)". This fact is a core ingredient in our proof. Combining it with 
several additional ideas leads to our improved lower bound. 



We now turn to present the proof of Theorem 1.2 In Section 3.1 we formally define the "forbidden 
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ball" B{z) described above and analyze its size. In Sections 3.2 and 3.3 we prove our theorem by showing 



that with high probability over the codeword x chosen by Alice, the decoding is successful. Namely, that 



Bob decodes a codeword x' which is equal to the transmitted codeword x. In Section 3.2 we analyze the 



probability (over x) that Bob decodes an incorrect codeword x' in which x and x' differ in their first an 



bits. In Section 3.3 we address x and x' which agree on their first an bits. Finally, in Section 3.4 we prove 
Theorem 1 1.21 

3.1 The "Forbidden BaU" isi^ (z) 

Consider a situation in which Alice transmits a codeword x. Namely, in the first step, Alice sends the first an 
bits of X and the channel flips qn of them for some q € [0, min(p, a)]. Let ei G {0, 1}"" x {0}(^~")" be the 
vector of Hamming weight qn that represents the channel's corruptions in the first step, and let z = x + ei be 
the (partially) corrupted codeword after the first step. In the second step Alice sends the remaining (1 — a)n 
bits of X. Since the channel is limited to a total number of pn corruptions, at most {p — q)n of the bits can be 
flipped in this step. Let 62 € {0}"" x {0, be the vector of Hamming weight at most {p — q)n that 

represents the channel's corruptions in the second step, and let u) = z + 62 = a; + ei + e2 be the corrupted 
codeword received by Bob. 

Conditioning on the first step, namely on the value of z, we are interested in counting the vectors that 
the channel (in its second step) may enforce Bob to consider in his nearest neighbor decoding. These are all 
the vectors y G {0, 1}" for which there exists a vector w G {0, 1}" such that 

• tt; is of distance at most pn from y, and 

• w and z agree on the first an bits and the distance between them is at most {p — q)n. 

Notice that the second item follows from the fact that our channel can only corrupt bits in the {l — a)n suffix 
of z in the second step. We define 

B^P''i\z) = {y G {0,1}'^ I 3w G {0,1}" s.t. dist//(ti;, y) <pn, = dist^^(z, < {p-q)n}. 

It is not hard to verify that (a) the original transmitted codeword is in B'i''^\z), and (b) if this is the 
only codeword in B^a''^^ {z) then Bob will decode successfully. It is also not hard to verify that the size of 
B^a''^\z) does not depend on z and therefore we can denote Ba''''' = \Ba''^\z)\ for any z G {0, 1}". The 
following claim bounds si^'*^^ and is proven in the appendix. 

Claim 3.1. For any < p < ^ • if^^(^) there exists an rj > such that for any 1 — H{2p) < a <\ — 2p 
andqe [0,p] it holds that B^^''^^ < 2(^(2p)-,?)n^ 



3.2 Errors Caused by Codewords with Distinct Prefixes 

Let C : [2^"] — )• {0, 1}" be a code chosen at random and let x G {0, 1}" be a codeword sent by Alice. 
Consider the setting in which Alice, in the first step, sends the prefix m = x\[an] the channel corrupts 
qn of its bits for some q G [0, mm{p, a)]. Let e G {0, l}"*^ x {0}(^~")" be the vector of Hamming weight 
qn that represents the channel's corruptions in the first step. In the second step Alice sends the last (1 — a)n 
bits of x and the channel is allowed to flip at most {p — q)n of these bits. After the first step, the set of 
vectors that are of Hamming distance at most pn from a vector that the channel can cause Bob to receive is 
exactly Ba''^\x + e). Therefore, if a nearest neighbor decoder fails then there must be another codeword 
of C (in addition to x) in Ba'''\x + e). In this section we study the probability that Ba''^\x + e) contains 
a codeword with a prefix that differs from m and show that it is small no matter what m or e are. Here, the 
probability is taken over the random construction of C. 

In general, it is not hard to verify that in expectation, indeed a random code C will ensure an exponen- 
tially decaying decoding error in the case under study (here, the expectation is over the code construction 
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and the error is over the messages of AUce). However, as the events corresponding to correct decoding are 
not independent of each other, our proof includes a rather delicate analysis. Our proof in this section consists 
of two parts. In the first part, we identify a certain property on codes C, and prove that it holds with very 
high probability. This property is then used in the second part of our proof, and enables to cope with the 
dependencies mentioned above. We start by defining our needed property on C. 

A code is considered as good with respect to the pair (m, e) if it has the following two properties: (a) the 
number of codewords with prefix m is close to its expectation and, in addition, (b) the number of codewords 
that do not start with m but alternatively may cause a decoding error on the transmission of a word that 
does start with m is not much larger than the expectation. This notion is formally defined below. We then 
show that for every m and e a code C chosen at random is good with respect to (m, e) with high probability. 



Recall the definitions of C"* and C™ from Section 2.2 



Definition 3.2. For a natural number n, p > 0, R > 0, e > 0, a = R — e, m £ {0, 1^°'^ and e S 
{0, 1}"" X {0}(i"")" of Hamming weight qnfor q £ [0, min(p, a)], we say that a code C : [2-^"] — > {0, 1}" 
is good with respect to the pair {m, e) if 

1. 2^"-i < IC"! < 2^"+\ and 

2- Exez„ |{^^ e I C(n) G B^;^'''\x + e)}| < si^'"^ • 2="+2 
where Z^a is the set of all vectors in {0, 1}" with mas a prefix, i.e., Z^a = {z £ {0, 1}" | z\\^^^ = m} . 



A remark regarding Item (|2]) of Definition 3.2 is in place. In general. Item Q estimates the number of 



codewords in C" that happen to be included in "forbidden balls" of type B^a''^^ {x + e) for vectors x G Zm 
(namely, x| = m). Later in our proof, we will think of as a randomly chosen codeword with prefix m, 
and the l.h.s. of Item ([2]) will correspond to the expected number of codewords in its "forbidden ball". 

Lemma 3.3. For every large enough n, p > 0, R > 0, e > 0, a = R — e, a prefix m G {0, 1}"" and 
e G {0, 1}"" X {0}(^"")" of Hamming weight at most pn, the probability that a code C : [2-^*^] — ^ {0, 1}" 
chosen at random is good with respect to (m, e) is at least 1 — 

Proof: Fix a pair (m, e) and assume that the Hamming weight of e is qn for q G [0, p]. For every u G [2^"] 
denote by X„ the indicator random variable defined to be 1 if u G and otherwise. Notice that the X^'s 
are independent and identically distributed and that |C""| = X]jie[2«"] -^u- Also, E [Xu] = Pr [Xu = 1] = 
and linearity of expectation implies that E [|C^|] = 2^" • = 2^". Applying the standard Chernoff 
bound (see, e.g., [1] Appendix A) we get that Item (fill of Definition |3.2| holds with probability at least 
1 — e 

Now, given that ([l]) holds, we will show that the probability that holds is 1 — e~2"'"' Thi& will imply 
that with such probabihty both ^ and ^ hold, as follows from Pr [Q A ([2])] = Pr [Q] • Pr [([2])| 0]. 
Since the summands in Item Q of Definition 3.2 are not independent, we cannot directly apply the 



Chernoff-Hoeffding bound. To overcome this issue, we express the summation in (|2]) as another summation 
of independent random variables. Details follow. Recall that Zm stands for the set of all vectors in {0, 1}' 
with m as a prefix. Define for every u G C™ the random variable 



Yu 



{x£Z^\C{u)£Bt'i\x + e)} 



Namely, Yu counts the number of balls Ba''^\x + e) (with x G Zm) which include C{u). Denote Y = 
Su6C™ ^'^^ observe that Y equals the sum from (|2|). Observe that the y„'s are independent and, more- 
over, they are independent even when conditioning on the size of the set C™. Given that u G C™, for every 
x G the probabihty that C{u) G Ba'''\x + e) is at most Hence, 



IS at most 



E[yJ<|Z,„|. , < 2-2(1-")" ^« 
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Notice that for every n G C™ we have y„ < si^'^^ and define = £ [0, f ] and Y' = 

Ba ' Be 

any k G [2''"-\ 2^"+^] use the Chernoff-Hoeffding bound (Theorem 2.3 1 to obtain 



. For 



Pr 



|C""| fc 



< Pr 



Pr 



Y > 



^ 4(2^" ^ fc) 



C""| = k 



Finally, for a large enough n we obtain 

Pr[([2])|([l])] = 1- Pr [r > B(f • 2""+2 

> l_g-n(2"). ^ Pr[|C"| = fc 

fe6f2="~i,2="+i] 



|C"| = fcA (1) 



•Pr 



|C""| = fc 



(1) 



(II 



We now turn to the second part of our proof. Let m be a prefix of a codeword sent by Alice and let e be 
the vector that represents the corruptions made by the channel in the first step. Consider a fixed choice of the 
codewords in C which do not have m as a prefix (i.e., C\qt^). The following lemma shows that the number 
of messages in C™ for which the channel may cause a decoding error due to messages in C™ is small with 
high probabihty. The probability here is over the choice of the codewords that start with m (since is 
fixed). 

For any u S C"" define Tu to be the number of codewords of messages from in the "forbidden ball" 
corresponding to u. Namely, r„ = | {u' E C™ | C {u') G Ba''^'^ {C{u) + e)} | . Let be an indicator random 
variable defined to be 1 if > 1 and otherwise. Finally, we let P^™'"^) denote the number of codewords 
with prefix m whose corresponding "forbidden balls" contain codewords associated with elements from 
C^. Formally, P^™'^) = E«ec™ ^n- We stress that messages u with P„ = 1 are considered as messages 
for which the channel may cause a decoding error. Thus our objective is to show that is small. 

Lemma 3.4. For every < p < ^ • H~^{^) there exists a 6p > such that for e < 6 < 6p, R = 
1 — H{2p) + 5 and a = R — e the following holds for any sufficiently large n. For every prefix m G {0,1}"", 
e G {0, l}"^" X {0}(^~")" of Hamming weight at mostpn, a fixed set of messages C™ and a fixed restriction 
CofCtoC^, 



Pr 



p{m,e) ^ 2S"/2 



c\ 



C" 



C A C is good with respect to (m, e) 



> 1 



Here, the probability is taken over the random construction of C. 

Proof: ForO <p < h-H^^{^) take 6p = min(| ■r}^H{2p) — 2p), where is the constant whose existence 



is guaranteed in Claim 3.1 Notice that 5p > since H{2p) > 2p, as follows from Fact 2.2 



Fix a pair (m, e) and assume that the Hamming weight of e is qn for q G [0,p]. Denote by G^™'*^) 
the event that C is good with respect to (m, e). Conditioning on C\-^ = C and on G^^'^\ every C(n) 
for u G C™ is independently and uniformly distributed over the vectors in {0, 1}" that start with m, and 
in particular the P^'s are independent. Since C satisfies Item Q of Definition |3.2| we get that for every 

u G C™, 



E 



Pu 



< E 



Tu 



< 



2(1-")" 



2(l-fl)n- 



Notice that our choice of 5p implies that 1 — H{2p) < R — e 
2{H(2p)^ri)n aaimlsT ~ 



Since C satisfies Item dlb of Definition 



a < R < 1 — 2p and hence B, 
we obtain that 



3.2 



E 



p{m,e) 



C AG^'^ 



< IG"^ 



< 



2(i-i?.)« - 2(i--R)" 
— 8 • 2'^'^+'^^'''" < 8 • 2*^'^/''+J*p^'''" < 8 • 2^"/^. 



9 



For a sufficiently large n, applying the Chemoff-Hoeffding bound (Theorem 2.3 1 yields 



Pr 



C A G^"''"^ 



< Pr 



C A G('"'") 



as desired. 



Combining Lemmas 3.3 and 3.4 we get the following corollary 



Corollary 3.5. For every < p < ^ ■ H^^{^) there exists a 5p > Q such that for s < 6 < Sp, R = 
1 — H{2p) + 6 and a = R — e the following holds for any sufficiently large n. The probability that 
a code C : [2^"] — )• {0, 1}" chosen at random satisfies that for every prefix m € {0, 1}°" and e S 
{0,1}°'"' X {0}^^~°'^'^ of Hamming weight at most pn, C is good with respect to {m, e) and P^™'''^^ < 2^"-/^, 



is at least 1 — e 



Proof: Let (m, e) be a fixed pair and denote by G^™'''^^ the event that C is good with respect to (m, e). In 
the following C denotes a restriction of C to C™. We have 



Pr 



p{m,e) ^ c^enl2 ^ Q(m,e) 



EH 



p{m,e) ^ c^njl 



c\ 



C" 



, 1 — e • Pr 



c 

|^^(m,e) 



CAG^™'") -Pr C| 



C A 



c ag(" 



>(l_e-2"^"V(l-e-''''"')>l-e-'^ 



where the first and the second inequalities follow, respectively, from Lemmas 3.4 and 3.3 Taking the union 
bound over all the possible pairs (m, e) completes the proof. ■ 



3.3 Errors Caused by Codewords with the Same Prefix 

In this section we consider decoding errors caused by codewords in C that have prefix (of length an) 
identical to the prefix of the transmitted codeword. Namely, we consider the scenario that Alice sends a 
codeword x. Bob gets the corrupted vector y, and the message that Bob outputs corresponds to a codeword 
that differs from x but shares the prefix x| A way to handle such errors is to verify that for every prefix 
m, our code C does not include (many) pairs of codewords that share m as a prefix and are close together, 
namely of Hamming distance at most 2pm. This is the type of analysis that actually corresponds to the 
classical adversarial channel, and can be used here as we are considering a special case of decoding errors. 

The following lemma says that a code C : [2^"] — )• {0, 1}" chosen at random with R < l — 4p has only 
few pairs of codewords that share a prefix and have Hamming distance at most 2pn. 

Lemma 3.6. For every p G [0, |), i? < l — Ap, a sufficiently small e > and a = R — e there exists a 7 > 
for which the following holds for any sufficiently large n. The probability that a code C : [2^"] — >■ {0,1}" 
chosen at random satisfies that 

1. for every m £ {0, 1}"", 2^"~i < |C""| < 2^"+^ 

2. and for every m £ {0, 1}"", besides at most 2("~'i')" of them, there exists a set Xm ^ C*™ of size 
\Xm\< 2(^-'>)" such that every distinct ui,U2 G \ Xm satisfy dist// (C(«i), C{u2)) > 2pn, 



is at least 1 



In order to prove Lemma 3.6 we need the following (known) claim that shows that with high probability 
a random code almost achieves the Gilbert- Varshamov bound. We include its proof for completeness. 
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Claim 3.7. Foranyp' G (0, |), e' > OandR' < l—H{2p')—e' the following holds for any sufficiently large 
n'. The probability that for a code C : [2^ ^] — )• {0, 1}" chosen at random there exists a set X C [2^ " ] of 
size \X\ < 2(^'-^'/2)n' such that every distinct ui,U2 € [2-^'"'] \ X satisfy dist/f (C(ui), C(u2)) > 2p'n' 
is at least 1 — 2~8^'"'. 



Proof: The probabiUty of two distinct messages to be mapped by C to codewords of Hamming distance at 

, / ■ E Wl"' ("') ')H(2p')n' 



most 2p n is 



-. Denote by Y the number of pairs of messages which are mapped by C 



to codewords of Hamming distance at most 2p'n' and notice that E \Y\ < " '^n'^^'' ■ -^PPly Markov's 
inequality to get that 



Pr 



Y > 2(fi'-s72)n' 



< 



B[Y] 



< 2-£'"72 



2(R'-s'/2)n' 

This implies that with probability at most 2^8*^ " we have Y > 2(^'-^'/2)n'_ Taking one message from 
every pair counted in Y, we get the required set X. m 



We now turn to prove our lemma. 



Proof of Lemma I3.6t The probability that C satisfies Item ([ijl is at least 1 

argument presented in the proof of Lemma [33] and a union bound argument over all the possible m's. Now 



e as follows from the 



. This will imply that both ^ 



we will show, given ([Ij) holds, that the probability that (|2|) holds is 1 — e 
and (jijl hold with probability 1 — g-2"'"' 

In order to analyze the probability of (|2]), let us first fix the size of the image of C for every prefix: 
for every m G {0, 1}"" denote km = |C™| G [2^"~^, 2"^"+^]. Denote by Tm the indicator random variable 
defined to be 1 if there is no set Xm ^ of size \Xm\ < 2(^-''')" such that every distinct ni,n2 G C™\X^ 
satisfy dist// (C(ni), C{u2)) > 2pn (where 7 is some positive constant to be determined later). In addition, 
define T = X]m6{o 1}"" Notice that given the fixed /c^'s, we can think of C as 2°" random mappings, 
where the mapping which corresponds to m maps every element in a domain of size km to an element in 
{0, uniformly and independently. Denote n' = (1 — a)n, R' = ^ ^ ■ log km and p' = 

Our assumption that i? < 1 — 4p implies that H{2p') is bounded away from 1 and hence for a small enough 
e > we have that R' = < 1 - H{2p') - e' for some e' > 0. Define 7 = e'(l - a)/4. Apply 

and derive that the probability that there is no set Xm ^ C™ of size \Xm\ < 2^^'"'^'/^)"' = 



3.7 



Claim 

2(e-27F^ 2{e-y)n ^^^^ ^^^^y distinct ui,U2 e C"" \ Xm Satisfy distil {C{ui), C{u2)) > 2pn is at 
most 2~5T". Therefore, E [Tm] = Pr [Tm = 1] < 2'^^"'. The T^'s are independent (given the fixed A;„'s) 



so for a sufficiently large n we can apply the Chernoff-Hoeffding bound (Theorem 2.3 1 to get 



Pr T > 2("-^)" Vm. \C 



< Pr r> 2-2("-t^)" Vm. \C 



< e 



Finally, 
Pr[(|2l)|([T])] 



1- Pr[T>2("-^)" Vm. |C™| = A;„A(1) 

{fcm}me{0,l}Q" 



• Pr 



1 - e 



Vm. IC^l = kr, 



(1) 



Vm. IC""! = kr, 



l-e 



(1) 



3.4 Proof of Theorem O 



The following corollary stems from Corollary |3. 5 1 and Lemma 3.6 by Fact 2.2 and the union bound 
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Corollary 3.8. For every < p < \ ■ H^^{^) there exist 5 > 0, e > 0, and 7 > such that for 
R = 1 — H{2p) + 5 and a = R — e the following holds for any sufficiently large n. The probability that a 
code C : [2^"] — t- {0, 1}" chosen at random satisfies that 

1. for every m E {0, 1}"", 2^"-^ < |C""| < 2^"+i, 

2. for every prefix m G and e G {0,l}°'^x{0}^^~"^"- of Hamming weight at most pn, < 

3. and for every m G {0, 1}"", besides at most 2("~'i')" of them, there exists a set Xm ^ C"^ of size 
\Xm\< 2(^-t)" such that every distinct ui,U2 G C"^ \ Xm satisfy distff (C(ui), C(n2)) > 2pn, 

IS at least 1 — e 



Equipped with Corollary 3.8 we are ready to prove Theorem 1.2 



Proof of Theorem O Fix < p < ^ • i?"^^) and let > 0, e > 0, 7 > 0, = 1 - H{2p) + 6 



and a = — e be as in Corollary 3.8 Also, let C : [2^"] — )■ {0, 1}" be a code that satisfies the three 
items in the corollary. Denote by M the set of all m G {0,1}"" for which there is a set Xm ^ C™ of size 
\Xm\ < 2^^"'^)" such that every distinct ui,U2 G C™ \ Xm satisfy dist^ {C{ui), C{u2)) > 2pn, and by 
M its complement M = {0, 1}"" \ M. The corollary guarantees that |M| < 2(°"^)". We restrict the code 
C to the domain U = [2^"] \ (UmeM^m) and denote the restricted code by C : [/ {0, 1}". Notice that 
\U\ > 2^" - 2"" • 2('^-t)" = 2^" - 2(^-'^)" > 2^""i for a sufficiently large n. We show that this code 
and the nearest neighbor decoder supply high probability of correct decoding and hence imply the theorem. 

Let X G {0,1}" be the codeword sent by Alice and denote by rux = x\ [an] ^ {0,1}°^" the vector that 
Alice sends in the first step of the two-step model. We first show that the probability over Alice's messages 
that G M is exponentially decaying: Pr [m,. G M] = E,„6m ^ - I^' " ^ - 2^"""^"- ^ 
2-7"+2 xhus, we may neglect the event that nix £ M. 

Now assume that rux G M. Observe that for every m ^ M the number of codewords of C that start 
with m satisfies |C™ \ Xm\ > 2^"~^ — 2^^"''')" > 2^"~^ for a large enough n. In the first step of our 
two-step model the channel outputs nix + e' for some e' G {0, 1}"" of Hamming weight at most pn. Extend 
e' to a vector e G {0, 1}" by concatenating it to (1 — a)n zeros. We now bound the probability of incorrect 
decoding averaged over all codewords x with prefix nix. We divide our analysis according to the cases 
discussed in Sections 122] and [ 



For the analysis corresponding to Section 3.2 consider the probability (taken over messages in C"^^ \ 



Xm^,) that the "forbidden ball" corresponding to x and e contains a codeword with a prefix that differs from 



nix- Recall that this probability bounds the probability of a decoding error in the setting of Section 3.2 and, 



by our definitions, is at most \(^ 7\xl, ~\ — — 2^ Here, the bound on holds since the 

code satisfies Item ([2]) in Corollary 3.8 



For the analysis corresponding to Section 3.3 due to our restriction C of C to [/ and the assumption 
nix G M, X is the only codeword with prefix nix and Hamming distance at most 2pn from x. Hence, the 
"forbidden ball" corresponding to x does not contain a codeword with a prefix that equals nix, implying no 



decoding error in the setting examined in Section 3.3 



Therefore, the probability (taken uniformly over Alice's message u G C/) of an incorrect decoding is at 
most Pr [m^ G M] + Pr [nix G M] ■ 22-^^/2 < 2-7^^+2 + i . 22-£"/2 = 2-^("). All in all, we obtain that 
the probability of a correct decoding is arbitrarily close to 1 for a sufficiently large n, which concludes our 
proof. 
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Appendix 



A Proof of Claim 3.1 



First, by our setting of p, notice that a > ^ + 7 for some 7 S (0, |) that depends only on p. Observe that 

(2p-q)n 
A:=0 i<min {pn,k) 

where Bk^i = • {^^^^^"')- Here, with respect to the notation in the definition of Ba''^\z), we set k to 
be equal to the Hamming distance between z and y, and i to be equal to the Hamming distance between 
and y\[an]- Notice that k < dist^l^, w) + distj^(w, y) < {2p — q)n and as z\^an] = M[an] it holds 
that i < pn. In order to prove the claim it suffices to prove the existence of an r/ > for which for every 
< A; < {2p — q)n and i < pn, we have B^ i < 2^^^'^p^~'^'^^ . Fix such a k and denote k = {2p — q')n for 
some g' G [q, 2p\. Consider the following two cases: 

• Case I: q' > 2j ■ p. Denoting i = /3k for (3 £ [0, 1] we obtain 

/ c 



Bk,i — 



an 



\ /(I - a)n\ „a«ff(^) . ^ nnH{!^) _ r,„H(2p-g') < r,n{H {2p)-y^^) 



where the first inequality follows from Fact 2. 1 and the second holds for 7]i = H{2p) —H{{2 — 2j)p) 
since H is monotonically increasing in [0, and 2p < ^. Notice that rji depends solely on p. 

Case 2: q' < 2^ ■ p. In this case we have a(2p — q') > p ■ 2a{l — 7) > p{l + 27) (1 — 7) > p. 
Denoting i = (p — /3)n for /3 G [0,p] such that p — q' + P < 1 — a we obtain 



Bi. 



k.i 



an \ f {l-a)n \ ^a„.if(Pzig)+(i-a)n-g( ''~^'+'^ ) ^ 2""--f^(-)+(i-")"--H"(T^) 
[p~l3)n) ■ \{p-q' + (3)n) ^ " » _ » 



To verify the last inequality, one can show that a{2p — q') > p implies that the function g : [0,p] — )• 
[0, 1] defined by g{P) = a - H[ 2^ \ + {1 — a)- H[ "^^ J is monotonically decreasing, as follows 



from calculating its derivative. The assumption a < 1 — 2p implies that Hyj^j < Hyjz^j since 

H is monotonically increasing in [0, |]. Now, let a* be the a € + 7, 1 — 2p] that maximizes 
a ■ H{^) + (1 - a) • i?(^)|2|and obtain that 



where the equality holds for some r]2 > that depends solely on p, as follows from Fact 2.1 using 

P. < P 

a* 1—a* ' 

Choosing ij = min r/2) completes the proof. 



^It can be seen that a* — I — 2p. 
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